HotSpot 3.1 Changes
-------------------
This version of HotSpot incorporates significant performance enhancements
to the block model with the most important of them being a contribution
from Greg Link (http://www.drgreglink.com/), who recently obtained his PhD
from Penn State University, and is now with the York College of Pennyslvania.
This contribution comes from his HotSpot-based thermal modeling tool
HS3d (http://www.cse.psu.edu/~link/hs3d.html). Where available, it replaces
HotSpot's matrix and vector function calls with calls to vendor-provided
fast math libraries conforming to the Basic Linear Algebra Subprograms
(BLAS: http://www.netlib.org/blas/) and Linear Algebra Package (LAPACK:
http://www.netlib.org/lapack/) standards. This provides significant increase
in performance for large floorplans (with say, greater than 100 units) where
the resultant matrices are huge.

Other improvements include an algorithmic speedup of the block model's
steady-state solver and conversion of the transient Runge-Kutta (RK4) solver
from a fixed step-size method to an adaptive one. The adaptive RK4 solver,
which is based upon the discussion in the "Numerical Recipes in C" book,
(http://www.library.cornell.edu/nr/bookcpdf.html) adjusts its step size
to the need of the problem. When the temperature changes rapidly (i.e., the
slope is steep), it uses small steps and when the temperature curve is flat
(the slope is shallow), it rushes through the solution with large steps. In
each step, it ascertains its own accuracy by computing the temperature twice
- once with the desired step size and another time with two half-steps. When
the difference between these two computations is greater than an allowable
limit, the step-size is decreased and vice-versa.

Installation
------------
1. Download the HotSpot tar ball (HotSpot-3.1.tar.gz) following the
instructions at http://lava.cs.virginia.edu/HotSpot

2. Unzip and untar the file using the following commands
	a) gunzip HotSpot-3.1.tar.gz
	b) tar -xvf HotSpot-3.1.tar

3. Go to the HotSpot installation directory
	a) cd HotSpot-3.1/
	
4. Uncomment the lines corresponding to your installation of the BLAS/LAPACK
vendor libraries (see http://www.netlib.org/blas/faq.html#5) and set the path
and compiler options corresponding to your library. This version of HotSpot has
code integrated for Intel Math Kernel Library, AMD Core Math Library, Apple
Velocity Engine and Sun Performance Library. Extending it to other vendors
should be straightforward. For such an extension, apart from the user guide
from those vendors, http://www.netlib.org/blas/blast-forum/cinterface.pdf might
also be useful as it provides useful description of the C interface to BLAS.

5. Build HotSpot
	a) make

7. To remove all the outputs of compilation, type 'make clean'. To remove the
object files alone, type 'make cleano'. To view the list of files HotSpot needs
for proper working, type 'make filelist'. To compile for debugging, use 'make
DEBUG=1'. To compile for using with a profiler (e.g. gprof), use 'make DEBUG=2'.

8. Known compatibility issues:
	a) For old AMD machines without SSE2 instructions, the most recent version
	of ACML available is 3.1.0. On such machines, the ACML library works with
	GCC 4.0 but not with GCC 4.1.
	b) Linking with Sun Performance Library on old Solaris machines fails as
	'libmtsk' was not found.
	(e.g. see http://supportforum.sun.com/jive/thread.jspa?threadID=72529)
	Installing patch 117560 from Sun's patch finder
	(http://sunsolve.sun.com/pub-cgi/show.pl?target=patches/patch-access)
	solves the problem.

Guidelines for Adaptive RK4
---------------------------
Adaptive RK4 adapts its step size by performing at each step size, almost
twice the computation of non-adaptive RK4. For small sampling intervals, there
is very little opportunity for gains due to adaptivity as there are only a few
steps to take even with a fixed step-size. Hence, adaptive RK4 might be slower
than the non-adaptive counterpart at small sampling intervals. However, at
larger intervals, even gains of multiple orders of magnitude are possible due
to adaptivity. Hence, it is important to choose the correct sampling interval
corresponding to your specific application.

A typical use of HotSpot in a processor simulator involves two runs - once with
default initial temperatures and a second time with initial temperatures set to
the steady-state temperatures computed in the previous run. During the second
run, since the transient temperatures are close to the steady-state, the slopes
are shallow and adaptive RK4 runs faster as it adapts itself into larger
step-sizes. If the average power values are available, the first run can be
entirely avoided as its purpose is only to compute the steady-state
temperatures. To facilitate this in the trace-driven 'hotspot' binary, its
command line options have been modified in this version. The "-o <file>" option
that outputs transient temperatures to the temperature trace file, has been
changed from being mandatory to optional. When the option is not provided,
transient simulation is avoided and only the steady-state temperatures are
computed based upon the average power values computed from the instantaneous
values in the power trace file.
